Optical flow, which computes the apparent motion from a pair of video frames, is a critical tool for scene motion estimation. Correlation volume is the central component of optical flow computational neural models. It estimates the pairwise matching costs between cross-frame features, and is then used to decode optical flow. However, traditional correlation volume is frequently noisy, outlier-prone, and sensitive to motion blur. We observe that, although the recent RAFT algorithm also adopts the traditional correlation volume, its additional context encoder provides semantically representative features to the flow decoder, implicitly compensating for the deficiency of the correlation volume. However, the benefits of this context encoder has been barely discussed or exploited. In this paper, we first investigate the functionality of RAFT's context encoder, then propose a new Context Guided Correlation Volume (CGCV) via gating and lifting schemes. CGCV can be universally integrated with RAFT-based flow computation methods for enhanced performance, especially effective in the presence of motion blur, de-focus blur and atmospheric effects. By incorporating the proposed CGCV with previous Global Motion Aggregation (GMA) method, at a minor cost of 0.5% extra parameters, the rank of GMA is lifted by 23 places on KITTI 2015 Leader Board, and 3 places on Sintel Leader Board. Moreover, at a similar model size, our correlation volume achieves competitive or superior performance to state of the art peer supervised models that employ Transformers or Graph Reasoning, as verified by extensive experiments.
translated by 谷歌翻译
用皮肤镜图像进行深度学习的黑色素瘤分类最近显示出在自动早期黑色素瘤诊断中的巨大潜力。然而,受到明显的数据失衡和明显的外部伪影的限制,即头发和尺子标记,从皮肤镜图像中提取的判别特征提取非常具有挑战性。在这项研究中,我们试图分别解决这些问题,以更好地表示病变特征。具体而言,基于GAN的数据增强(GDA)策略可与拟议的隐式脱糖(IHD)策略一起生成合成黑色素瘤阳性图像。其中,与头发相关的表示通过辅助分类器网络隐式分散,并反向发送到黑色素瘤 - 特征提取主链,以提供更好的黑色素瘤特异性表示学习。此外,为了训练IHD模块,头发的噪音还标记在ISIC2020数据集上,这使其成为第一个带有类似头发伪影的注释的大型皮肤镜数据集。广泛的实验证明了所提出的框架的优势以及每个组件的有效性。改进的数据集可在https://github.com/kirtsy/dermoscopicdataset上公开可用。
translated by 谷歌翻译
最近,深度神经网络具有极大的高级无效磁共振图像(MRI)重建,其中大多数研究都遵循单个解剖学中的一个网络时尚,即每个专家网络都经过训练和评估特定解剖结构。除了培训多个独立模型的效率低下之外,此类公约还忽略了各种解剖学的共享脱张知识,这些知识可以彼此受益。为了探索共享知识,一种天真的方法是将来自各种解剖学的所有数据结合起来,以训练全能网络。不幸的是,尽管存在共同的脱氧知识,但我们透露,不同解剖学的独家知识可能会恶化特定的重建目标,从而导致整体绩效降低。在这项研究中观察到这一点,我们提出了一个新型的深MRI重建框架,并具有解剖结构和解剖学特异性的参数化学习者,旨在“寻求共同点,同时解决不同的解剖学差异”。尤其是主要的解剖学共享的学习者是暴露于不同的解剖学上,以模拟蓬勃发展的共同知识,而有效的解剖学特异性学习者则接受了目标解剖结构的培训,以进行独家知识。在两个MRI重建网络中,在我们的框架顶部介绍并探索了四个不同的解剖学学习者实现。关于大脑,膝盖和心脏MRI数据集的全面实验表明,其中三个学习者能够通过多种解剖学协作学习来增强重建性能。
translated by 谷歌翻译
现有的实例分割方法已经达到了令人印象深刻的表现,但仍遭受了共同的困境:一个实例推断出冗余表示(例如,多个框,网格和锚点),这导致了多个重复的预测。因此,主流方法通常依赖于手工设计的非最大抑制(NMS)后处理步骤来选择最佳预测结果,这会阻碍端到端训练。为了解决此问题,我们建议一个称为Uniinst的无盒和无端机实例分割框架,该框架仅对每个实例产生一个唯一的表示。具体而言,我们设计了一种实例意识到的一对一分配方案,即仅产生一个表示(Oyor),该方案根据预测和地面真相之间的匹配质量,动态地为每个实例动态分配一个独特的表示。然后,一种新颖的预测重新排列策略被优雅地集成到框架中,以解决分类评分和掩盖质量之间的错位,从而使学习的表示形式更具歧视性。借助这些技术,我们的Uniinst,第一个基于FCN的盒子和无NMS实例分段框架,实现竞争性能,例如,使用Resnet-50-FPN和40.2 mask AP使用Resnet-101-FPN,使用Resnet-50-FPN和40.2 mask AP,使用Resnet-101-FPN,对抗AP可可测试-DEV的主流方法。此外,提出的实例感知方法对于遮挡场景是可靠的,在重锁定的ochuman基准上,通过杰出的掩码AP优于公共基线。我们的代码将在出版后提供。
translated by 谷歌翻译
准确估计深度加强学习(DRL)中的价值函数至关重要,使得代理可以执行适当的动作而不是次优。然而,现有的演员 - 评论家方法遭受低估偏差或高估偏差或多或少的偏差,这对其性能产生负面影响。在本文中,我们揭示了一种简单但有效的原则:适当的价值校正效益偏见缓解,在那里我们提出了使用任何非减小功能的广义激活的加权运算符即激活函数,作为更好的值估计的权重。特别地,我们将广义激活的加权运算符集成到价值估计中并引入一种新颖的算法,广义激活的深双重决定性政策梯度(GD3)。理论上我们表明GD3能够减轻潜在的估计偏差。我们有趣地发现,简单的激活功能导致满足性能,没有额外的技巧,并且可以有助于更快的收敛。对众多具有挑战性的连续控制任务的实验结果表明,具有任务特定激活的GD3优于普通基线方法。我们还发现了微调多项式激活功能在大部分任务中实现了卓越的结果。
translated by 谷歌翻译
我们提出了一种用于高质量实例分段的新颖隐式功能细化模块。现有的图像/视频实例分段方法依赖于明确堆叠的卷积来在最终预测之前优化实例特征。在本文中,我们首先对不同的细化策略进行了实证比较,这揭示了广泛使用的四个连续卷积是不必要的。作为替代方案,重量共享卷积块提供竞争性能。当这种块被迭代为无限时间时,块输出最终将使均衡状态变得平衡状态。基于该观察,通过构建隐式功能来开发隐式特征细化(IFR)。可以通过模拟无限深度网络通过定点迭代来获得实例特征的平衡状态。我们的IFR享有几个优点:1)模拟无限深度细化网络,同时只需要单个残余块的参数; 2)产生全球接收领域的高级均衡实例特征; 3)用作即插即用的一般模块,很容易扩展到大多数对象识别框架。 Coco和YouTube-Vis基准的实验表明,我们的IFR实现了最先进的图像/视频实例分段框架的性能,同时降低了参数负担(EG1%AP改进掩码R-CNN,只有30.0掩模头中的%参数)。代码是在https://github.com/lufanma/ifr.git提供的
translated by 谷歌翻译
多目标增强学习被广泛应用于计划和机器人操纵中。多进球强化学习的两个主要挑战是稀疏的奖励和样本效率低下。 Hindsight Experience重播(她)旨在通过进球重新标记来应对这两个挑战。但是,与她相关的作品仍然需要数百万个样本和庞大的计算。在本文中,我们提出了多步事化经验重播(MHER),并根据$ n $ step Relabeling合并了多步重新标记的回报,以提高样品效率。尽管$ n $ step Relableling具有优势,但我们从理论上和实验上证明了$ n $ step Relabeling引入的非政策$ n $步骤偏置可能会导致许多环境的性能差。为了解决上述问题,提出了两种偏差降低的MHER算法,Mher($ \ lambda $)和基于模型的Mher(Mmher)。 Mher($ \ lambda $)利用$ \ lambda $返回,而Mmher从基于模型的价值扩展中受益。对众多多目标机器人任务的实验结果表明,我们的解决方案可以成功减轻$ n $ n $步骤的偏见,并获得比她的样本效率明显更高,并且课程引导她,而她几乎没有其他计算。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
Learning feature interactions is the key to success for the large-scale CTR prediction and recommendation. In practice, handcrafted feature engineering usually requires exhaustive searching. In order to reduce the high cost of human efforts in feature engineering, researchers propose several deep neural networks (DNN)-based approaches to learn the feature interactions in an end-to-end fashion. However, existing methods either do not learn both vector-wise interactions and bit-wise interactions simultaneously, or fail to combine them in a controllable manner. In this paper, we propose a new model, xDeepInt, based on a novel network architecture called polynomial interaction network (PIN) which learns higher-order vector-wise interactions recursively. By integrating subspace-crossing mechanism, we enable xDeepInt to balance the mixture of vector-wise and bit-wise feature interactions at a bounded order. Based on the network architecture, we customize a combined optimization strategy to conduct feature selection and interaction selection. We implement the proposed model and evaluate the model performance on three real-world datasets. Our experiment results demonstrate the efficacy and effectiveness of xDeepInt over state-of-the-art models. We open-source the TensorFlow implementation of xDeepInt: https://github.com/yanyachen/xDeepInt.
translated by 谷歌翻译